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Abstract 



Systemic approaches to the study of a biological cell or tissue rely increasingly on the 
use of context-specific metabolic network models. The reconstruction of such a model 
from high-throughput data can routinely involve large numbers of tests under different 
conditions and extensive parameter tuning, which calls for fast algorithms. We present 
FASTCORE, a generic algorithm for reconstructing context-specific metabolic network mod- 
els from global genome- wide metabolic network models such as ReconX. FASTCORE takes 
qv . as input a core set of reactions that are known to be active in the context of interest (e.g., 

<3\ \ cell or tissue) , and it searches for a flux consistent subnetwork of the global network that 

contains all reactions from the core set and a minimal set of additional reactions. Our key 
observation is that a minimal consistent reconstruction can be defined via a set of sparse 
modes of the global network, and FASTCORE iteratively computes such a set via a series 
of linear programs. Experiments on liver data demonstrate speedups of several orders 
of magnitude, and significantly more compact reconstructions, over a chief rival method. 
Given its simplicity and its excellent performance, FASTCORE can form the backbone of 
many future metabolic network reconstruction algorithms. 



1 Introduction 

Cell metabolism is known to play a key role in the pathogenesis of various diseases |11] such 
as Parkinson's disease [28J and cancer [18]. The study of human metabolism has been greatly 
advanced by the development of computational models of metabolism, such as Recon 1 |12j . 
the Edinburgh human metabolic network [T7], and Recon 2 [37]. These are genome-scale 
metabolic network models that have been reconstructed by combining various sources of 
'omics' and literature data, and they involve a large set of biochemical reactions that can be 
active in different contexts, e.g., different cell types or tissues [36]- 

To maximize the predictive power of a metabolic model when conditioning on a specific 
context, for instance the energy metabolism of a neuron or the metabolism of liver, recent 
efforts go into the development of context-specific metabolic models [9] Ell El E31 [2]. These 
are network models that are derived from global models like Recon 1, but they only contain 



a subset of reactions, namely, those reactions that are active in the given context. Such 
context-specific metabolic models are known to exhibit superior explanatory and predictive 
power than their global counterparts [21 (, I14|, f5| . 

Most algorithms for context-specific metabolic network reconstruction first identify a rel- 
evant subset of reactions according to some 'omics' information (typically expression data 
and bibliomics), and then search for a subnetwork of the global network that satisfies some 
mathematical requirements and contains all (or most of) these reactions [21 [32], [2lj [TJ [I9l [2] . 
The mathematical requirements are typically imposed via flux balance analysis, which char- 
acterizes the steady-state distribution of fluxes in a metabolic network via linear constraints 
that are derived from the stoichiometry of the network and physical conservation laws [311 
[M[ [29| [15j [13] . The search problem may target the optimization of a specific functionality of 
the model (e.g., biomass production) or some other objective [4j, and it may involve repeated 
tests under different conditions and parameter tuning [3j [HI 127] , The latter calls for fast 
algorithms. 

We present fastcore, a generic algorithm for context-specific metabolic network recon- 
struction, fastcore takes as input a core set of reactions that are supported by strong 
evidence to be active in the context of interest. Then it searches for a flux consistent sub- 
network of the global network that contains all reactions from the core set and a minimal set 
of additional reactions. Flux consistency implies that each reaction of the network is active 
(i.e., has nonzero flux) in at least one feasible flux distribution [31|. ITj . An attractive feature 
of fastcore is its generality: As it only relies on a preselected set of reactions and a simple 
mathematical objective (flux consistency), it can be applied in different contexts and it allows 
the integration of different pieces of evidence ('multi-omics') into a single model. 

Computing a minimal consistent reconstruction from a subset of reactions of a global 
network is, however, an NP-hard problem [JJ, and hence some approximation is in order. Our 
key observation is that a minimal consistent reconstruction can be defined via a set of sparse 
modes of the global network, and fastcore is designed to compute a minimal such set. 
Every iteration of the algorithm computes a new sparse mode via two linear programs that 
aim at maximizing the support of the mode inside the core set while minimizing that quantity 
outside the core set. fastcore's search strategy is in marked contrast to related approaches, 
in which the search for a minimal consistent reconstruction involves, for instance, incremental 
network pruning [21] . fastcore is simple, devoid of free parameters, and its performance is 
excellent in practice: As we demonstrate on experiments with liver data, fastcore is several 
orders of magnitude faster, and produces much more compact reconstructions, than the main 
competing algorithm MBA |21j . 

2 Methods 

2.1 Background 

A metabolic network of m metabolites and n reactions is represented by an m x n stoichio- 
metric matrix S, where each entry Sij contains the stoichiometric coefficient of metabolite % 
in reaction j. A flux vector v € M. n is a tuple of reaction rates, v = (v\, . . . ,v n ), where V{ 



is the rate of reaction i in the network. Reactions are grouped into reversible ones (1Z) and 
irreversible ones (I). For a reaction i £ X holds Uj > 0; this and other imposed flux bounds, 
e.g., lower and upper bounds per reaction, are denoted by B (which defines a convex set). 
A flux vector is called feasible or a mode if it satisfies a set of steady state mass-balance 
constraints that can be compactly expressed as: 

Sv = 0, v£B. (1) 

An elementary mode is a feasible flux vector v 7^ with minimal support; that is, there is no 
other feasible flux vector w/0 with supp(w) C supp(v), where supp(v) denotes the support 
(i.e., the set of nonzero entries) of v |3H I15| . A metabolic network model is called (flux) 
consistent if each reaction in the network is in the support of some mode, that is, for each 
reaction i there exists a mode v with Vi 7^ (in practice \vi\ > e, for some small positive 
thresholds) |3HP- 

2.2 Network consistency testing 

Given a metabolic network model with stoichiometric matrix S, a problem of interest is to 
test whether the network is consistent or not. Additionally, if the network is inconsistent, it 
would be desirable to have a method that detects the inconsistent part. 

It has been suggested that network consistency can be detected by a single linear program 
(LP) [lj. The idea is to first convert each reversible reaction into two irreversible reactions 
(and define a reversible flux as the difference of two irreversible fluxes), and then test if the 
minimum feasible flux on the new set J of irreversible-only reactions is strictly positive (in 
practice, at least e). This is equivalent to testing if the following LP is feasible: 



(LP-2) 



This test of consistency, however, can produce spurious solutions. In Figured] we show a toy 
metabolic network comprising four metabolites (A,B,C,D) and six reactions annotated with 
corresponding fluxes v±,. . . ,vq, each bounded by \v{\ < 3. All stoichiometric coefficients are 
equal to one, except for the reaction — >2A. The only reversible reaction is AoB, which is a 
dead-end reaction and therefore inconsistent, whereas all other reactions are irreversible and 
consistent. After converting A-f-^B to a pair of irreversible reactions, ILP-21 achieves optimal 
value z* = 1.5, which implies (wrongly) that the network is consistent. The test here fails 
because the two irreversible copies of Af>B have equal flux at the solution, thereby nullifying 
the actual net flux of A-f>B. 

A straightforward solution to the problem would involve iterating through all reactions, 
computing the maximum and minimum feasible flux of each reaction via an LP that satisfies 
the constraints in (JT]). This is the idea behind the FVA algorithm and the reduceModel 
function of the COBRA toolbox [24, [30]. However, iterating through all reactions can be 
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Figure 1: A metabolic network with one inconsistent reaction (Af>B) 



inefficient. A faster variant is fastFVA [IB], which achieves acceleration over FVA via LP 
warm-starts. Another fast algorithm is CMC |21j . which involves a series of LPs, where each 
LP maximizes the sum of fluxes over a subset J of reactions: 

max \ v j 

j&J (LP-3) 

s.t. Sv = veB. 

The set J is initialized by J = 1Z U I (all reactions in the network), and it is updated after 
each run of lLP-3l so that it contains the reactions whose consistency has not been established 
yet. When J cannot be reduced any further, we can reverse the signs of the columns of S 
corresponding to the reversible reactions in J and resume the iterations. Eventually, all 
remaining reactions may have to be tested one by one for consistency, as in FVA. Such an 
iterative scheme is complete, in the sense that it will always report consistency if the network 
is consistent, and if not, it will reveal the set of inconsistent reactions. However, as we will 
clarify in the next section. lLP-31 is not optimizing the 'correct' function, which may result in 
unnecessarily many iterations. For example, when applied to the network of Figure [H ILP-31 
will pick up the elementary mode that corresponds to the pathway A— >C— >D (because this 
pathway achieves maximum sum of fluxes v\ + v^ + V5 + v 6 = 1.5 + 3 + 3 + 3), and it will 
set t>3 = 0. To establish the consistency of the reaction A— >T), an additional run of ILP-31 
would be needed, where the set J would only involve the reactions AoB and A— >D. Hence, 
an iterative algorithm like CMC that relies on ILP-31 would need two iterations to detect the 
consistent part of this network. However, one LP suffices in this example, as we explain in the 
next section. In more general problems involving larger and more realistic networks, CMC 
may involve unnecessarily many iterations, as we demonstrate in the experiments. 

2.3 Fast detection of consistent reactions 

In most problems of interest there will be no single mode that renders the whole network 
consistent, and an iterative algorithm like the one described in the previous section must be 
used. For performance reasons it would therefore be desirable to be able to establish the 
consistency of as many reactions as possible in each iteration of the algorithm. 

Since consistency implies nonzero fluxes, it is sufficient to optimize a function that just 
'pushes' all fluxes away from zero. Formally, this amounts to searching for modes v whose 
cardinality — denoted by card(v) and defined as card(v) = \supp(v)\, i.e., the number of 
nonzero entries of v — is as large as possible. Directly maximizing card{v) is, however, not 



straightforward. First, the card function is quasiconcave only for v € R™ (the nonnegative 
orthant), and it is nonconvex for general v € R n |5]. Second, even if we restrict attention to 
nonnegative fluxes in each iteration (which we can do without loss of generality by flipping 
the signs of the corresponding columns of S), it is not obvious how to efficiently maximize the 
quasiconcave card{v). Third, in practice consistency implies fluxes that are e-distant from 
zero, in which case some adaptation of the card function is in order. 

The above suggests the following approximation to the card function for nonnegative 
fluxes. Note that, when v > 0, the cardinality function can be expressed as 



card(v) = ^9{vi) , (4) 

where 9 : R — > {0, 1} is a step function: 

*(*) = ( ° i l Vi = ° (5) 

The key idea is to approximate the function 9 by a concave function that is the minimum of 
a linear function and a constant function: 

0K)«min{^,l}, (6) 

where e is the flux threshold. Introducing an auxiliary nonnegative variable Zi € R+ for each 

flux variable w, and taking epigraphs [6], it is easy to verify that, for constant e, the function 

card{v) can be approximately maximized over an arbitrary set J of nonegative fluxes via the 

following LP: 

max 
v.z 







Zi e [0, e] 


Vie J 


Vi > Zi 


Vie J 


Sv = 


veB. 



s.t. Zi e [0,e\ Vi e J, z t £ R+ (LP-7) 



Note that ILP-71 tries to maximize the number of feasible fluxes in J whose value is at least e 
(contrast this with ILP^2j) . 

Returning to the network of Figure HJ if J comprises all network reactions, then note 
that the flux vector [^1,^2,^3,^4,^5, Vq] = [e,0,e,e,e,2e] is an optimal solution of ILP-71 
Hence, a single run of the latter can establish the consistency of all reactions (except V2) 
of that network. More generally, a single run of ILP-71 on an arbitrary subset J of a given 
network will typically identify all consistent irreversible reactions of J . The intuition is 
that ILP^7l prefers flux 'splitting' over flux 'concentrating' in order to maximize the number of 
participating reactions in the solution, which, in the case of irreversible reactions, corresponds 
to flux cardinality maximization. 

By construction, the above approximation of the cardinality function applies only to 
nonnegative fluxes. In order to deal with reversible reactions that can also take negative fluxes, 
we can embed ILP-71 in an iterative algorithm (as in the previous section), in which reversible 



reactions are first considered for positive flux via ILP-71 and then they are considered for 
negative flux. The latter is possible by flipping the signs of the columns of the stoichiometric 
matrix that correspond to the reversible reactions under testing, in which case the fluxes of the 
transformed model are again all nonnegative, and the above approximation of the cardinality 
function can be used. This gives rise to an algorithm for detecting the consistent part of a 
network that we call fastcc (for fast consistency check). Since fastcc is just a variant of 
fastcore, we defer its detailed description until the next section. 

2.4 Context-specific network reconstruction 

The reconstruction problem involves computing a minimal consistent network from a global 
network J\f (with stoichiometric matrix S) and a 'core' set C of reactions that are known 
to be active in a given context. Formally, given a consistent network J\f and a set C C TV, 
the problem is to find the smallest consistent subnetwork A Q J\f such that C C A. This 
problem is known to be NP-complet4j[lJ, suggesting that a practical solution should entail 
some approximation. 

Our approach hinges on the observation that a consistent subnetwork A of M can be 
defined via a set of modes of N: 

Theorem 1. Let V be a set of modes of M, and let A = U^gy supp{v) be the union of the 
supports of these modes. The network model defined by A and Sj,, the submatrix of S that 
contains only the columns indexed by A, is consistent. 

Proof. For each v G V, let va be the 'truncated' v after dropping all dimensions not indexed 
by A. Clearly, £4^4 = 0, therefore each vj^ is a mode in the reduced model {A, Sj}. By 
construction of A, each reaction in A is in the support of some v G V, and hence also in the 
support of some mode U4 of the reduced model. □ 

This simple result allows one to cast the reconstruction problem as a search problem over 
sets of modes of the global network W: 

min card(A) 
V 

s.t. A = M supp(v) /g) 

CQA. 

This is still a difficult (combinatorial optimization) problem, but it is amenable to useful 
approximations. In particular, in fastcore we search over sets of modes V via a greedy 
scheme that incrementally adds modes to V, reminiscent of greedy heuristics for the related 
set covering problem [10J. Further, as a means to approximately minimize card(A), each 



1 Acufia et al. [T] prove NP-completeness of this problem by noting that a special case involves C being the 
empty set, in which case the problem comes down to finding the smallest elementary mode of N , which is 
NP-complete Q]. However, this leaves open the case of a nonempty core set C, in which a solution need not 
constitute an elementary mode. We conjecture that the problem remains NP-hard when C is nonempty, but 
we are not pursuing this question here. 



Input: A consistent metabolic network M — TL Ul with stoi- 
chiometric matrix S, and a set C C M. 
Output: A consistent subnetwork AQM such that CCi. 
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function fastcore( N, C ) 

J^Cnl, T^Af\C, A^9 

flipped 4— false, singleton «— false 
A <- A U FindSparseMode( J, V, false ) 
J<-C\A 
while J / do 
V <^V\A 

A <r- A U FindSparseMode( J, P, singleton ) 
if J n A / then 

i7 ■<— i7 \ .4, flipped <— false 
else 

if flipped then 

flipped 4- false, singleton <— true 
else 

flipped <— true 
if singleton then 

J <- J"(l) (the first element of J) 
else 

/J^/7 
end if 
for each i £ J \X do 

flip the sign of the i'th column of S and 
swap the upper and lower bounds of v% 
end for 
end if 
end if 
end while 
return A 
end function 



Input: A set J C C, a penalty set V C A/\C, 

and the singleton flag (s) . 
Output: The support of a "P-sparse mode. 
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Figure 2: The fastcore algorithm for context-specific metabolic network reconstruction. 



added mode is constrained to have sparse support outside C. This is implemented via Li- 
norm minimization, which is a standard approach to computing sparse solutions to (convex) 
optimization problems [61 [22] . 

The overall fastcore algorithm is shown in Figure [2j The algorithm maintains a set 
J CC that is initialized with the irreversible reactions in C, and a 'penalty' set V = (Af\C)\A 
that contains all reactions outside C that have not been added yet to the set A. Each iteration 
adds to the set A the support of a mode that is sparse in V, computed by the function 
FindSparseMode. The latter first involves an ILP-71 to compute an active subset /C of J~, 



and then the following Li-norm minimization LP constrained by the set fC: 



mm 
v.z 



s.t. Vi G [-Zi,Zi] \/i eV, Zi e M+ (LP-9) 

Vi> e Vi € /C 

St* = v <E B . 

The ILP-91 minimizes Y^iep \ v i\i the Li norm of fluxes in the penalty set V (expressed via 
epigraphs), subject to a minimum flux constraint on the set /Cg The algorithm first goes 
through the I D C reactions, then the 1ZP\ C ones, and eventually through each individual 
reaction (variable singleton). The flipped variable ensures that a reversible reaction is tested 
in both the forward and negative direction. The algorithm terminates when all reactions in 
C have been added to A, which is guaranteed since in the main loop the set J never expands 
(step llOp and the global network M is consistent. Note that fastcore has no free parameters 
besides the flux threshold e. 

The FASTCC algorithm for detecting the consistent part of an input network (see pre- 
vious section) can be viewed as a variant of fastcore(A/',A/') in which the steps [1014141 of 
FindSparseMode are omitted (and there is no V set). It is easy to verify that FASTCC is 
complete, in the sense that it will always report consistency if the network is consistent, and 
if not, it will reveal the set of inconsistent reactions. 

The closest algorithm to FASTCORE is the MBA algorithm of Jerby et al. [2TJ. MBA takes 
as input two core sets of reactions, and it searches for a consistent network that contains all 
reactions from the first set, a maximum number of reactions from the second set (for a given 
tradeoff), and a minimal number of reactions from the global networko Both FASTCORE and 
MBA involve a search for a minimal consistent subnetwork, however the search strategy of 
fastcore is very different to MBA: Whereas FASTCORE iteratively expands the active set A 
starting with A = 0, MBA starts with A = M and iteratively prunes the set A by checking 
whether the removal of each individual reaction (selected in random order) compromises 
network consistency. Consistency testing in MBA is carried out with the CMC algorithm 
that is based on ILP-31 as explained earlier. Hence, fastcore's search strategy differs to 
MBA in two key aspects: First, consistency testing in FASTCORE involves the maximization 
of flux cardinality (JLP-7P instead of sum of fluxes (|LP-3|) , which results in fewer LP iterations 
(see experiments). Second, the search for compact solutions in FASTCORE involves Li-norm 



2 Some care is needed to preempt false negative solutions arising from the minimization of L\ norm in ILP-91 
For example, suppose in the network of Figure [T] that M comprises all reactions except A-H-B, and C = J — 
K, = {6} and V — {1, 3, 4, 5}. In this case. ILP^9l could settle to a solution [vi, V3, va, wgjUe] = [f , e, 0, 0, e]. The 
flux vi, being below e, would be treated as zero by FindSparseMode, in which case the reaction — >-2A would 
be erroneously excluded from the reconstruction. A simple way to avoid this is to use a scaled version of e (we 
used 10 5 e) in the second constraint of lLP-91 with an equal scaling of all flux bounds in J\f '. 

3 FASTCORE can be easily adapted to work with multiple core sets, by introducing a set of weights that 
reflect the confidence of each reaction to be active in the given context, and adding appropriate regularization 
terms in the objective functions of ILP-71 and ILP-91 that capture the given tradeoff. We will address this variant 
in future work. 



Table 1: Comparing fastcc to fastFVA [16] and CMC [21] on three input models 

c-Mouse (|A/"| = 2432) c-Reconl (\Af\ = 2469) c-Recon2 (|A/"| = 5834) 



#LPs 



time 



#LPs 



time 



#LPs 



time 



fastFVA 


4864 


9 


4938 


9 


11668 


207 


CMC 


184 


11 


49 


2 


42 


11 


FASTCC 


2 


0.2 


9 


0.4 


19 


5 



minimization instead of pruning. The advantage of the former is that it can be encoded by a 
single LP, resulting in significant speedups. 

3 Results 

We report results on two sets of problems, the first involving consistency verification of an 
input model, and the second involving the reconstruction of a context-specific model from an 
input model and a core set of reactions. The fastcore algorithm was implemented in the 
COBRA toolbox [3D], using Matlab 2013a and the IBM CPLEX solver (version 12.5.0.0). Test 
runs were performed on a standard 1.8 GHz Intel Core i7 laptop with 4 GB RAM running 
Mac OS X 10.7.5. In all experiments we used flux threshold e =le-4. The software is available 



from http : //bio . uni . lu/systems_biology/sof tware/f astcore/ 



3.1 Consistency testing 

In the first set of experiments we applied fastcc, the consistency testing variant of fastcore, 
for consistency verification of three input models, and compared it against the FastFVA 
algorithm of Gudmundsson and Thiele [16], and an own implementation (based on fastcc 
but with ILP-31 replacing ILP-7J) of the CheckModelConsistency (CMC) algorithm of Jerby 
et al. [21]. We also tested the FVA algorithm of the reduceModel function of the COBRA 
toolbox [30], and the MIRAGE algorithm of Vitkin and Shlomi [38], but we do not include 
them in the results as they performed worse than the reported ones. The three input models 
were the following: 

• c-Mouse (|A/"| = 2432), the consistent part (after removing all inconsistent reactions, in 
total 1295) of the genome-wide metabolic model of mouse, developed by Sigurdsson et 
al. 



• c-Reconl (|A/"| = 2469), the consistent part of Reconl [12]. (Recon 1 was found to 
contain 1273 inconsistent reactions.) 

• c-Recon2 (|A/"| = 5834), the consistent part of Recon 2 [37]. (Recon 2 was found to 
contain 1606 inconsistent reactions.) 



in seconds 



Table 2: Comparing fastcore to MBA [21] on liver model reconstruction from c-Reconl 
liver core set (\C\ = 1069) strict liver core set (\C\ = 1083) 





\A\ 


iif: 


#LPs 


tim<^ 


\A\ 


IR 


#LPs 


time 


MBA 

FASTCORE 


1826 
1746 


1573 
1546 


72279 
20 


7383 
1 


1888 
1818 


1630 
1627 


71546 
20 


6730 
1 



The results are shown in Table [TJ fastcc is using a significantly smaller number of LPs 
than the other two algorithms, and is in general faster. We note that fastFVA is based on a 
highly optimized Matlab/C++ implementation with LP warm-starts, while fastcc is based 
on standard Matlab. These results confirm the appropriateness of flux cardinality (JLP-7P as 
a metric for network consistency testing, in agreement with the theoretical analysis and the 
discussions above. 

3.2 Reconstruction of a liver model 

In the second set of experiments, we used the fastcore algorithm to reconstruct a liver 
specific metabolic network model from the consistent part of Recon 1 (c-Reconl, \Af\ = 2469), 
and we compared against an own implementation of the MBA algorithm of Jerby et al. [21] . 
We applied the two algorithms in two settings. The first setting involves the liver specific 
input reaction set of Jerby et al. [21 j . which is based on 779 'high' core and 290 'medium' 
core reactions (the latter set is supported by weaker biological evidence than the former). To 
allow a comparison with fastcore, we defined a single core set as the union of the high and 
medium core reaction sets, and we applied the two algorithms on this core set. The second 
setting uses the 'strict' liver model of Jerby et al. [21j . which contains 1083 high core reactions 
and no medium core reactions, and therefore allows a direct comparison with fastcore. 

The results for the two settings are shown in Table [2] (note that for MBA, the reported 
number of LPs and the runtime refer to a single pruning iteration of the algorithm; the 
corresponding figures for the full MBA algorithm could be 1000-fold higher). In both settings, 
fastcore is several orders of magnitude faster than MBA, achieving a full reconstruction of 
a liver specific model in about one second, using a much smaller number of LPs. As MBA 
employs a greedy pruning strategy for optimization, the number of LPs that it uses and its 
total runtime can be very high, as also indicated by Wang et al. [39J who reported runtime 
of a single pruning pass of MBA in the order of 10 hours on a 2.34 GHz CPU computer. The 
reconstructed models by fastcore are also considerably more compact than those obtained 
by MBA, with a difference of 70-80 non-core reactions. For the liver model, 1683 out of the 
1746 reactions (96%) of the fastcore reconstruction appear also in the MBA reconstruction, 
whereas for the strict liver model the common reactions are 1743 out of 1818 (also 96%). 

To evaluate fastcore's performance in correctly identifying liver reactions, we performed 



5 number of intracellular reactions 

6 the reported time (in seconds), as well as the number of LPs, refer to a single pruning step of MBA; the 
runtime of the complete MBA algorithm would be 1000-fold. 
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Figure 3: Applying fastcore on random core sets from c-Reconl 



a five-fold cross-validation in which fastcore was used to reconstruct liver metabolism based 
on a reduced, randomly selected set of 80% core reactions. A hyper geometric test was used 
to test for the enrichment of the left-out core reactions in the added non-core reactions. 
Repeating this cross-validation procedure 500 times, we obtained a median hypergeometric 
p- value of 0.0021, indicating the ability of fastcore to capture missing liver specific reactions. 
We also tested fastcore on 1000 randomly generated core sets drawn uniformly from 
c-Reconl, of random size drawn uniformly from {l,0.9|jV|} (where \J\f\ = 2469). Almost 
every reconstruction was obtained between 0.5 and 2 seconds (plot omitted). In Figure [3] 
we show the size of the reconstructed model as a function of the size of the core set, where 
we observe a remarkably small conditional variance. We will further investigate the latter in 
future work. 



3.3 Reconstruction of a murine macrophage model 

We also used the fastcore algorithm to build a cell-type specific murine macrophage model 
from the consistent part of Reconlbio comprising |jV| = 2474 reactions. Reconlbio (\M\ = 
3745) is a modified Reconl model that contains three extra reactions (biomass, NADPOX, 
and a sink reaction to balance the glycogenin self-glucosylation reaction) [5]. We used a 
core set comprising 300 (out of 382) proteomics derived Raw264.7 macrophage reactions, as 
described by Bordbar et al. [5]. (The remaining 82 reactions could not be added to the core set 
as they are situated in an inconsistent region of Recon 1 and therefore carry a permanent zero 
net flux.) For their macrophage reconstruction, Bordbar et al. used, among other methods, 
GIMMEp — a variant of the GIMME algorithm [3j that is similar to the MBA algorithm — and 
they obtained a network model containing 1026 intracellular reactions. Our main interest was 
to investigate whether fastcore can obtain a functional network that is at least as compact 
as the one obtained with GIMMEp. fastcore generated (in about one second and using 11 
LPs) a consistent network model of 953 reactions, 831 of which are intracellular reactions. 
This is a much more compact model than the one obtained with GIMMEp. 
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4 Discussion 

fastcore is a generic algorithm for context-specific metabolic network reconstruction from 
genome-wide metabolic models, and it was mainly motivated by requirements of fast compu- 
tation and compactness of the output model. 

The key advantage of having a fast reconstruction algorithm is that it permits the exe- 
cution of multiple runs in order to optimize for extra parameters or test different core sets 
extracted from the input data [14]. For example, when working with gene expression data, 
the definition of the core set may depend on the threshold used to segregate between high 
expression genes (core reactions) and low expression genes (non-core reactions) [3]. As the 
choice of threshold is rather arbitrary, a practical approach could involve evaluating the ro- 
bustness of the output model as a function of the chosen threshold, fastcore can perform 
this analysis in a few minutes, whereas for the same problem other algorithms would need 
hours or days. (Algorithms like GIMME or GIMMEp that require manual curation and as- 
sembly of subnetworks, would also fail in this kind of task.) Another example where fast 
computation is imperative is cross-validation. In the current study (see experiments) we ran 
a cross-validation procedure 500 times, an operation that took a few minutes with fastcore 
but that would barely be manageable with other reconstruction algorithms. Other examples 
where fast computation is important are time-course experiments or experiments involving 
different patients or conditions [20j . There fastcore could more easily identify differential 
models over time and/or input conditions. 

Compactness is a key concept in various research areas of biology, such as the minimal 
genome [261 125] . Notwithstanding, the requirement of model compactness seems to be in 
disagreement with the observation that biological systems are fairly redundant and this re- 
dundancy serves a specific purpose, namely, the fast adaptation to changes in the environment. 
Alternative pathways that perform similar functions are known to be expressed in different 
environmental conditions, allowing for instance to metabolize another type of sugar when 
glucose is not available [35] • At any rate, the pursuit of compactness in metabolic network 
reconstruction need not be in conflict with the notion of redundancy. Alternative pathways 
will be included in a reconstructed model as long as 'redundant' reactions that are supported 
by biological evidence are included in the core set. 
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